[v2] ClientSession runs on JSONRPCDispatcher; BaseSession removed#2838
Open
maxisbey wants to merge 23 commits into
Open
[v2] ClientSession runs on JSONRPCDispatcher; BaseSession removed#2838maxisbey wants to merge 23 commits into
maxisbey wants to merge 23 commits into
Conversation
aec8155 to
a51357c
Compare
6 tasks
- New CallOptions key cancel_on_abandon (default true): abandoning a request
(timeout or caller cancellation) sends notifications/cancelled unless the
caller opted out or the request carries resumption hints
- Bound the two shielded cancellation-path writes with a 5s deadline so a
wedged transport write cannot hang shutdown or a cancelled caller
- Capitalize the connection-closed fan-out message ("Connection closed")
- Pin the server-seat timeout contract in the interaction suite: a timed-out
server-initiated request is followed by notifications/cancelled
A raising notification handler ran as a bare task in the dispatcher's task group, so its exception cancelled the read loop and every in-flight request. Wrap spawned handlers in the same containment boundary progress callbacks already have: log the failure and keep the connection serving.
Transports yield Exception items on the read stream for connection faults and parse errors; the dispatcher debug-logged and dropped them. An optional observer now receives them (awaited in the read loop, contained so a raising observer costs the item, not the connection). Unset keeps the old behavior.
TransportT now defaults to TransportContext (PEP 696, same pattern as shared/context.py), so omitting transport_builder no longer needs a dedicated overload to pin the type parameter.
ClientSession keeps its public surface (constructor, typed methods, manual initialize, context-manager lifecycle) but now owns a JSONRPCDispatcher instead of inheriting the v1 BaseSession receive loop. Server-initiated requests are answered through the existing callbacks via the closed-union parse; notifications validate-or-drop and tee to message_handler; transport exceptions reach message_handler through the dispatcher's stream-exception observer. A from_dispatcher constructor accepts a pre-built dispatcher for in-process embedding. mcp.shared.session shrinks to the surviving names: the ProgressFnT re-export and a typing-only RequestResponder stub for MessageHandlerFnT annotations. Behavior changes (deliberate, to be covered in the migration guide): - request ids count from 1; the progress token follows - timeouts use the dispatcher error text and send notifications/cancelled, so a timed-out server handler is interrupted instead of running on - responses with unknown ids are ignored per spec instead of surfacing a RuntimeError to message_handler - a raising request callback is answered with code 0 and the exception text - notification callbacks run concurrently (no completion-before-response) Three interaction-suite divergence entries are resolved and deleted, and the server-to-client cancellation requirement is now pinned by a passing test.
Adds tests for ServerMessageMetadata routing, related-request-id notifications, and params-absent inbound requests over direct dispatch, plus the migration-guide entry for the ClientSession dispatcher swap.
Transport teardown closes the standalone stream's send side first, so a writer parked in receive() ends on a clean end-of-stream; but when teardown lands while the writer is between dequeues, the next receive() raises ClosedResourceError, which fell into the catch-all and logged a traceback at ERROR level for a routine disconnect. Catch it and end quietly. A new test pins the close ordering that keeps the parked path clean.
Replaces the from_dispatcher classmethod: read_stream/write_stream become optional and dispatcher is a keyword-only alternative, with mutual exclusion validated at construction. Drops the __new__-based alternate constructor and its shared state-init helper.
Covers MCPError.from_jsonrpc_error and the context-stream sync close() methods, whose only exercisers died with BaseSession and its tests, and restructures three test handler arms that could never take their false branch.
- Bound the timeout-path courtesy cancel like the other abandon writes and extract a single _final_write policy: 5s for courtesy cancels, 1s for shutdown responses so closing a session stays fast on a wedged transport - Answer shutdown-interrupted requests with CONNECTION_CLOSED and retire the REQUEST_CANCELLED constant (-32002 collides with resource-not-found) - Key the peer-cancel error response on cancelled_caught so a cancel landing after the handler finished cannot produce a second answer for the same id - Decide outbound metadata and cancel-on-abandon suppression in one place: only resumption hints that actually reach the transport suppress the courtesy cancel - Never send notifications/cancelled for a request whose write never completed - Identity-guard the in-flight pop so a finished handler cannot evict a newer entry that reused its request id - Map request-write stream failures to MCPError(CONNECTION_CLOSED); warn when a bounded final write gives up; mark the Dispatcher lifecycle provisional
- Unwind the entered task group when __aenter__ is cancelled while the dispatcher is starting, instead of abandoning its cancel scope - Deliver transport-level exceptions to message_handler concurrently and contained, like notifications, so a handler that awaits session I/O no longer deadlocks the read loop - Route related_request_id=0 correctly in send_notification (ids are opaque) - Document the dispatcher= constructor path in send_request's contract
- Add an interaction test proving a timed-out initialize sends no notifications/cancelled, and drop the stale deferred rationale for the initialize-not-cancellable requirement - Record null-id error-response drops in the requirements manifest and describe the cancelled-request error response as applying to both seats - Record wire messages only after the inner send succeeds, so a failed or cancelled write is not logged as sent - Bound the remaining indefinite waits, refresh comments stale since the courtesy-cancel change, and assert the issue-88 timeout by error code - Rewrite the suite README's concurrency section for the dispatcher model
Drive the between-dequeues teardown path directly through the transport's ASGI entry point with a gated send, so the ClosedResourceError arm is covered by a real test and no longer needs its coverage pragma. The e2e teardown test's docstring now claims only what its assertion proves.
Cut the comment and docstring volume roughly in half: single-sentence docstring summaries, Raises sections kept but shortened, inline narration replaced by one-line statements of the non-inferable constraint, and development-artifact comments removed. No code changes.
ServerMessageMetadata.related_request_id exists for the server's streamable-HTTP transport to route outbound messages onto the originating request's SSE stream. No client transport has ever serialized it, so ClientSession's related_request_id parameter and ServerMessageMetadata acceptance were dead inheritance from the shared v1 BaseSession. - send_notification loses its related_request_id parameter - send_request's metadata narrows to ClientMessageMetadata | None (resumption hints, the live part) - the isinstance(dispatcher) downcasts those parameters forced are gone Progress and response correlation (progressToken in params, JSON-RPC id) are payload-level mechanisms and are unaffected.
RequestContext[ClientSession] was the only instantiation of the generic left in the tree (the server seat has ServerRequestContext), so the public ClientRequestContext alias becomes the real dataclass and the private mcp.shared._context module is deleted. request_id is now always populated: the client only builds a context for inbound requests, and ping is answered before any context exists.
Server-initiated sampling/elicitation/roots requests over a ClientSession built with dispatcher=DirectDispatcher failed before the callback ran: the session requires a populated request id and direct dispatch carried none. DirectDispatcher now assigns per-instance monotonic ids to inbound requests (notifications keep None, which is how middleware distinguishes them). Adds a non-ping direct-dispatch test and bounds the indefinite awaits in the existing dispatcher= tests.
send_raw_request raised RuntimeError both before run() and after the transport closed, contradicting the documented contract that connection loss surfaces as MCPError(CONNECTION_CLOSED). A closed flag now separates the two states: RuntimeError remains for use before run(), and a request after EOF gets the same CONNECTION_CLOSED error the in-flight waiters receive.
The previous canary (a raising message_handler) could never fire: correct code drops unknown-id responses before any handler, and regressed code's delivery paths contain handler exceptions. Collect surfaced messages instead and assert only a control notification arrives; verified against a simulation of the v1 surface-as-RuntimeError behavior.
ac209ab to
1703d42
Compare
Cut the ClientSession section by about a third: merged single-concern bullets (stray responses; the timeout exemptions), dropped rationale padding, reframed the shutdown bullet as new-versus-v1 (v1 never answered at shutdown), and removed the REQUEST_CANCELLED bullet entirely - the constant never existed in v1, so it has no place in a v1-to-v2 guide.
run() entered both streams inside its own task group, so at teardown the write stream closed before in-flight handlers sent their final answers: the shutdown CONNECTION_CLOSED response was deterministically dropped on the EOF path and raced the close on the cancel path. The write stream's scope now wraps the task group, so scope exits order the join strictly before the close and teardown writes always land. The shutdown-delivery test becomes a real memory-stream pin, and the wedged-shutdown test's synthetic stream is replaced by a plain unread one.
Two rendezvous tests: three concurrent outbound tool calls proven simultaneously in flight and resolved out of order to their own callers, and two overlapping server-initiated sampling requests serviced concurrently by the client's callbacks (the v1 receive loop serialized these). Also note in the migration guide that delivery concurrency is unbounded.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
V2 client receive path:
Transport -> JSONRPCDispatcher -> ClientSession callbacks, replacing theBaseSessionreceive loop. Companion to #2710, which did the same for the server seat;BaseSessionis now deleted.Motivation and Context
ClientSessionwas the last consumer of the v1BaseSessionengine. Moving it onto the dispatcher unifies both seats on one receive path and fixes the v1 client's structural problems: server-initiated requests were handled inline in the receive loop (a slow sampling callback blocked everything, a callback that sent a request deadlocked, and peer cancellation was unactionable), and a raising notification handler could take down the connection.ClientSession's public surface is unchanged: same constructor, same typed methods, manualinitialize(), same context-manager lifecycle. A new keyword-onlydispatcher=constructor argument accepts a pre-built dispatcher instead of the stream pair (e.g.DirectDispatcherfor in-process embedding).What changed
CallOptions.cancel_on_abandon+ suppression: abandoning a request (timeout or caller cancellation) sends a courtesynotifications/cancelled, unless the caller opted out (initialize, which the spec forbids cancelling) or the request carries resumption hints (the peer's work must survive for the resume)on_stream_exceptionobserver are contained at the dispatcher: a raising handler costs itself, never the connection (matching the TypeScript/C#/Go engines)ClientSessioninternals rewritten over the dispatcher;mcp.shared.sessionshrinks to a compatibility module (ProgressFnTre-export, typing-onlyRequestResponderstub forMessageHandlerFnTannotations)Parity bar
The transport-parametrized interaction suite passes. Three recorded divergences are now resolved and deleted from the requirements manifest (
protocol:timeout:sends-cancellation,protocol:cancel:late-response-ignored,protocol:cancel:server-to-client), and the server-to-client cancellation requirement is pinned by a new test passing over all three transports.Breaking Changes
Documented in
docs/migration.md(one grouped entry):Request 'tools/call' timed out, and a timed-out or abandoned request sendsnotifications/cancelled, interrupting the server handlerRuntimeErrortomessage_handlerMETHOD_NOT_FOUND(carrying the method indata), completing Fix unknown-method error code and add a protocol version registry #2836's contract on the client seatcode=0and the exception text (previously flattened toINVALID_PARAMS)send_requestbefore entering the context manager raisesRuntimeErrorimmediatelyFixes #2489. Fixes #2507. Supersedes #2490.
Related, not closed here: #2673 (a raising request callback now answers the server with
code=0and the exception text instead ofINVALID_PARAMS; the opt-in propagation of the exception to the localcall_toolawaiter that the issue requests stays open as a separate decision) and #2610 (server-seat bug on a code path that no longer exists onmainafter #2710; still open forv1.xtriage — not addressed here).How Has This Been Tested?
Full suite (1741 tests) including the interaction suite over in-memory, SSE, and streamable HTTP; 100% branch coverage held on all changed files. New tests: courtesy-cancel wire pins and suppression, wedged-transport shutdown bound (trio virtual clock), notification-handler containment, stream-exception observer, the
dispatcher=constructor over direct dispatch, server-seat timeout cancellation, abandoned-server-request cancellation, standalone-stream teardown ordering.Types of changes
Checklist
AI Disclaimer